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Abstract: Single nucleotide polymorphisms (SNPs) are widely employed in the studies of 
population genetics, molecular breeding and conservation genetics. In this study, we 
explored a simple route to develop SNPs from non-model species based on screening the 
library of single copy nuclear genes (SCNGs). Through application of this strategy in 
Panax, we identified 160 and 171 SNPs from P. quinquefolium and P. ginseng, respectively. 
Our results demonstrated that both P. ginseng and P. quinquefolium possessed a high level 
of nucleotide diversity. The number of haplotype per locus ranged from 1 to 12 for 
P. ginseng and from 1 to 9 for P. quinquefolium, respectively. The nucleotide diversity of 
total sites (n T ) varied between 0.000 and 0.023 for P. ginseng and 0.000 and 0.035 for 
P. quinquefolium, respectively. These findings suggested that this approach is well suited 
for SNP discovery in non-model organisms and is easily employed in standard genetics 
laboratory studies. 

Keywords: conservation genetics; Panax ginseng; Panax quinquefolium; single copy 
nuclear gene 
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1. Introduction 

Detection and assessing the genetic variations of a given species is one of the fundamental issues in 
biology. Since Mendel initially developed the phenotype-based genetic markers in his experiments, the 
identification and employment of genetic markers have made great progress in the past decades [1]. 
Specifically, a series of molecular markers have been explored due to the advances in molecular 
technologies. For example, restriction fragment length polymorphism (RFLP) is the first DNA marker 
that provides an efficient molecular tool to evaluate the genetic variation of a species [2]. This 
hybridization-based technique is widely utilized to detect DNA polymorphisms because of its 
relatively high polymorphic, co-dominantly inherited and highly reproducible. In addition, the 
development of polymerase chain reaction (PCR)-based molecular markers, such as random amplified 
polymorphic DNA (RAPD), amplified fragment length polymorphism (AFLP) and microsatellite, also 
supply an array of approaches that yield a large number of genetic variations in different organisms [3,4]. 
For instance, the microsatellite markers are broadly employed as a reliable DNA marker for multiple 
purposes across a wide range of species, including QTL tagging, population genetics, molecular 
breeding and phylogenetic analysis [5-7]. 

In recent years, however, the availability of abundant genetic resources for numerous organisms is 
contributing to a transition to the use of single nucleotide polymorphisms (SNPs) [8]. In particular, 
recent progresses in the cost and accuracy of high throughput sequencing technologies are 
revolutionizing the opportunities for producing genetic resources in different organisms [9]. 
For example, Geraldes et al. [10] have identified 0.5 million putative SNPs in 26,595 genes of the 
model species black cottonwood {Populus trichocarpa) using high-throughput sequencing technology. 
Similarly, Howe et al. [11] have also characterized 278,979 unique SNPs from the non-model species 
Pseudotsuga menziesii through screening of a reference transcriptome. Notably, although the next 
generation sequencing platforms have generated a large numbers of SNPs in both model and non-model 
organisms, some of these DNA polymorphisms are distributed in the duplicate regions of the genome 
{i.e., different members of the same gene families) that might result in the paralogous sequence 
variants (PSVs) and eventually limit the utilization of SNPs. Therefore, there is an urgent need to 
develop reliable SNPs from single copy nuclear genes (SCNGs) that could be used for applications 
such as molecular phylogenetics and genetic mapping. To this end, we explored a simple and 
straightforward approach to characterize SNPs from Panax ginseng C.A. Meyer and P. quinquefolium L. 
by screening the constructed library of Arabidopsis SCNGs. Panax L. (Araliaceae), commonly known 
as ginseng, is a medicinally important genus in the Orient and includes 18 species with 16 from eastern 
Asia and two from eastern North America [12,13]. P. ginseng is one of the highest valued medicinal 
species within Panax. Although P. ginseng was widely distributed in Russia, Korea and China at the 
beginning of 20th century, there exists only a few individuals in natural environments due to the over 
exploitation of wild resources and the destruction of natural habitats [14,15]. To date, P. ginseng has 
been listed as a rare and endangered plant in China [16]. Similarly, P. quinquefolium L. (American 
ginseng) is also a medicinal plant which is native to North America and widely cultivated in China [17]. 
Results from molecular phylogenetic analyses revealed that P. ginseng and P. quinquefolium are most 
closely related species within this genus [12,13]. To explore reliable SNPs from P. ginseng and 
P. quinquefolium, we developed 16 single copy nuclear genes (SCNGs) from the Panax dbEST of 
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GenBank (http://www.ncbi.nlm.nih.gov/dbEST/index.html) (1 October 2012) [18]. These SCNGs may 
provide a series of useful molecular markers for future studies of conservation genetics. 

2. Results and Discussion 

2.1. Development of SNPs from SCNG Library 

The predominant type of molecular genetic marker has changed substantially over the past 
decades [8]. To date, SNP markers have come to prominence due to the abundant polymorphism in 
genomes, low-scoring error rates and relative ease of calibration among laboratories [9]. Specifically, 
with the advances in DNA sequencing technologies, SNP markers have contributed greatly to the 
genetic studies of model organisms. A large numbers of SNPs were retrieved from Arabidopsis, 
Oryza and Populus via the employment of high throughput DNA sequencing platforms [19-22]. 
Nonetheless, the application of SNPs in non-model species lagged behind because of the limitation of 
marker development and the existence of PSVs [23]. Although the strategies of transcriptome and 
reduced representation genomic libraries sequencing also generated a numbers of SNPs from different 
non-model organisms, utilization of these SNP discovery approaches as standard tools in non-model 
species remain challenging thus far [24-27]. The main stumbling block hindering wide adoption of 
SNPs in non-model organisms is that these next generation sequencing technology-based approaches 
are too expensive for the population level analysis, in particular to these studies with large sample size, 
because it is sometimes impossible to assemble all the short reads without a reference genome. 
In addition, it is also difficult to distinguish the sequencing errors and PSVs from true SNPs. Take the 
maize as an example, it has been demonstrated that although millions of SNPs were identified, only a 
small portion of those polymorphisms could be utilized for the further development of robust and 
versatile assays [28]. To this end, we explored a simple strategy to develop SNPs from non-model 
species P. ginseng by performing a BLAST homology search against the constructed SCNGs library of 
Arabidopsis. Accordingly, a total of 22,824 Panax ESTs were analyzed and 542 of them showed high 
similarity to the references of Arabidopsis SCNGs. Forty-five primer pairs were designed from the 
exon regions of Panax SCNGs, of which 16 primer pairs produced clear amplicons of the expected 
size in P. ginseng (Table 1) and ten of which were successfully amplified in P. quinquefolium (Table 2). 
To ensure whether the SNPs were actually retrieved from the orthologous genes, we have analyzed the 
genetic divergence of all the obtained clones for each putative SCNG. As expected, only a small 
amount of sites showed single nucleotide variation and almost all of the retrieved SNPs were found at 
these sites. These attributes suggested that the 16 nuclear genes are likely single copy nuclear gene in 
P. ginseng and P. quinquefolium. Through screening DNA polymorphisms of the 16 SCNGs, 
we successfully identified 160 and 177 SNPs from P. quinquefolium and P. ginseng, respectively. 
In addition, the obtained sequences of SCNGs produced alignments ranging from 278 base pair (bp) to 
1,339 and 286 to 853 bp mP. ginseng and/ 5 , quinquefolium, respectively. All of these DNA sequences 
have been submitted to GenBank under the accession numbers of KF529139-KF529528. 
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Table 1. Nucleotide diversity of the 16 single copy nuclear genes in Panax ginseng. 



Locus 


Primer sequences (5'-3') 


Alignment (bp) 

exon intron 


r a (°C) 


S 


h 






7lNon 


Function annotation 


PGN7 


F: CCCAATGCCCCCAGAGTTTT 
R- AGrGAGGTGfTGrTTGA AGT 

I\ . r\ VJ V VJ r\ VJ VJ 1 VJv^ 1 VJ V 1 1 VJ /\/\ VJ 1 


441 


336 


54 


21 


6 


0.779 


0.011 


0.003 


beta-amyrin synthase 


PW2 


F: AGCACAAGCTCAAGCGTCTC 
R- rAGTTGGPTGGrATA AC ACC 


63 


269 


48 


5 


12 


0.947 


0.007 


0.015 


40S ribosomal protein S27 


PW8 


F: ATAGCTCGTGTAACTGATGG 
R- TTGAGTGrGGGTGTPTGA AT 

Iv . 1 1 VJ J\ VJ 1 VJ V VJ VJ VJ 1 VJ 1 V 1 VJ J\ r\ 1 


119 


555 


64 


30 


10 


0.926 


0.000 


0.000 


vesicle transport protein 


PW16 


F: ATTGGTGGAGGGAAGGAACT 
R- GAGTGGPATGAGrAGTATGT 

I\ . VJ J\ VJ 1 VJ VJ V / V 1 VJ J\ VJ V / V VJ 1 J \ 1 VJ 1 


170 


278 


52 


7 


9 


0.905 


0.009 


0.004 


prolyl-tRNA synthetase 


PW21 


F: AAAAGGTTGGCTACGAGTGG 

R- TAP ATGATGGGTGGAGGAGA 


146 


140 


64 


2 


3 


0.658 


0.003 


0.000 


photosystem I reaction center 


PW28 


F: GGGGTGGGAATTTGGAAGTA 

R- TGA AGGAGPATPGGA ACC AT 


155 


205 


60 


15 


2 


0.526 


0.022 


0.017 


photosystem I reaction center 


PZ7 


F: ACCTGGTTCGCTGCTATTCC 

R- C A AGP ATTGGTTrrTTPTGG 


97 


304 


52 


2 


3 


0.468 


0.001 


0.000 


PGR5-like protein 1A 


PZ12 


F: GAGCGTTCTCAAATGCGGTAG 

R • TTT A GPTTr AAA TTGGTPGG 


118 


830 


54 


1 


2 


0.100 


0.001 


0.000 


60S ribosomal protein 


PZ15 


F: TGAACAGGCATTATTACTCG 
R- ATTrATCrTrrTrTTGA ATG 

1 v . J\ V. 1 V. / v 1 V. V. 1 V. V. 1 V. 1 1 vj i\ /\ V. VJ 


105 


653 


48 


0 


1 


0.000 


0.000 


0.000 


26S proteasome non-ATPase 


PZ14 


F: CTTTGTTTCTCCTCCTCCAG 
R- GGATTTrrAGAGPA ArrTTT 

I\ . VJ VJ /v 1 1 1 L\ A V.J n VJ v A n L\ 1 1 1 


178 


667 


54 


11 


4 


0.621 


0.007 


0.000 


diacylglycerol kinase 1 


PZ10 


F: CTATGATGGGGTCTGGAGGG 
R- AGPAGTGATGGTGGATGAGG 

I\ . /A VJ L AU 1 VJ J \ 1 VJ VJ 1 VJ VJ / V 1 VJ J\ VJ VJ 


305 


445 


62 


32 


8 


0.853 


0.023 


0.013 


glycine decarboxylase 


PZ13 


F: AGCAGCCGAGTATGAAACCC 
p . p ptp a nrrr AAA rn a t a a rrr. 


174 


845 


56 


0 


1 


0.000 


0.000 


0.000 


signal peptidase complex 

Oil r\ 1 1 "VI 1 \~ i l-< 

suDumi jo 


PZ1 


F: CACTACCCCGTTCTTTTCCG 
R: CCTTTTGTTCCTCAACCACC 


372 


967 


60 


25 


6 


0.768 


0.010 


0.009 


glycine decarboxylase 
P-protein 


PZ4 


F: TGTTGACCATCTACTCACCCAG 
R: CCTTCACGCATTCCCACAAT 


207 


426 


48 


7 


8 


0.895 


0.006 


0.000 


hypothetical protein 


PZ5 


F: TGACGGACTTGACCTAACAT 
R: CTTCAGATACAGCCCACAGC 


171 


707 


56 


12 


4 


0.621 


0.007 


0.000 


ABC transporter F family 
member 3 -like 


PZ8 


F: GGGAAGGAAAAGTTGCTCTG 
R: TATTCGTGTTGGGGCATCTG 


196 


545 


60 


7 


4 


0.779 


0.005 


0.000 


hypothetical protein 



T. d , annealing temperature; S, number of segregating sites; h, number of haplotypes; H d , haplotype diversity; n T , nucleotide diversity for total sites; 
7i Non , nucleotide diversity for nonsynonymous sites. 
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Table 2. Nucleotide diversity of the ten single copy nuclear genes in Panax quinquefolium. 



Locus 


Alignment (bp) 


- r a (°c) 


S 


h 








exon 


intron 


PGN7 


441 


338 


54 


23 


7 


0.964 


0.010 


0.001 


PW2 


60 


266 


48 


15 


8 


0.956 


0.027 


0.079 


PW16 


153 


290 


40 


27 


9 


0.978 


0.035 


0.033 


PW21 


152 


134 


60 


16 


2 


0.556 


0.031 


0.047 


PZ7 


98 


290 


52 


11 


4 


0.733 


0.016 


0.015 


PZ15 


96 


665 


48 


0 


1 


0.000 


0.000 


0.000 


PZ14 


178 


675 


49 


9 


5 


0.800 


0.005 


0.004 


PZ10 


302 


445 


60 


26 


2 


0.556 


0.019 


0.021 


PZ5 


204 


516 


46 


26 


7 


0.911 


0.020 


0.040 


PZ8 


138 


477 


60 


7 


4 


0.822 


0.006 


0.000 



r a , annealing temperature; S, number of segregating sites; h, number of haplotypes; H d , haplotype diversity; 
7t T , nucleotide diversity for total sites; 7i No n, nucleotide diversity for nonsynonymous sites. 



2.2. Nucleotide Diversity in P. ginseng and?, quinquefolium 

These SNPs can be employed to investigate the molecular phylogenetics, population genetic and 
molecular breeding of the Panax species. For example, although several previous studies have 
employed allozyme, random amplification polymorphism DNA (RAPD), inter simple sequence repeat 
(ISSR), amplification fragment length polymorphism (AFLP) and microsatellite techniques to 
investigate the genetic diversity of P. ginseng, these genetic markers are largely from unknown regions 
of the genome and can not be applied among laboratories that might have less practical value in the 
further studies [14,15,29]. In this study, we applied these SNPs to evaluate the nucleotide diversity of 
P. ginseng and P. quinquefolium. Results from the polymorphic loci of P. ginseng revealed that 
nucleotide diversity ranged from 0.001 to 0.023 for total sites (7i T ) and from 0.000 to 0.017 for 
nonsynonmous sites (7i N on), respectively (Table 1). Similarly, nucleotide diversity of P. quinquefolium 
varied from 0.005 to 0.035 for total sites and from 0.000 to 0.079 for nonsynonmous sites, respectively 
(Table 2). The genetic diversity based on SNP markers has been also reported in some other crop 
plants. For example, Haudry et al. [30] have employed 21 nuclear genes to investigate the genetic 
diversity of Triticum turgidum ssp. dicoccum and revealed that this species possessed low genetic 
diversity (7i T = 0.0008). Likewise, low genetic diversity were also found in Zea may ssp. may 
(7i T = 0.0064) and Hordeum vulgare (7i T = 0.0031) [31,32]. In comparison with these previous studies, 
our results showed that although a small amount of individuals of P. ginseng and P. quinquefolium 
were investigated respectively, both the two Panax species exhibited relatively high level of nucleotide 
diversity at both total (7ix are 0.007 and 0.017 for P. ginseng and P. quinquefolium, respectively) and 
nonsynonmous (7It are 0.004 and 0.024 for P. ginseng and P. quinquefolium, respectively) sites. 
Notably, we found that P. quinquefolium showed relatively higher genetic diversity at both total and 
nonsynonmous sites in comparison with P. ginseng. It indicated that P. ginseng might have undergone 
genetic bottleneck during the domestication process. In addition, Wang et al. [33] have developed an 
amplification refractory mutation system (ARMS)-PCR method and successfully applied it to identify 
the ginseng cultivars. Here, our results showed that no haplotypes were shared between P. ginseng and 
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P. quinquefolium. It suggested that these molecular markers could be employed to distinguish the two 
Panax species. 

3. Experimental Section 

3.1. Samples and DNA Extraction 

SNPs discovery was assessed in samples from 20 individuals of P. ginseng and ten individuals of 
P. quinquefolium. The detailed information of the specimens was listed in Table 3. In general, the 
20 samples were collected from ten localities and each of them contained two individuals. Similarly, 
the ten individuals of P. quinquefolium were also obtained from two locates. Genomic DNA was 
extracted from leaves of each individual using a Plant Genomic DNA kit (TianGen, Beijing, China) 
following the manufacturer's protocols. 



Table 3. Details of localities sampled from the field in this study and number of individual 
sequenced for each locus in each locality. 



Species name 


Locality 


Country 


Latitude/ 


Elevation 


Number of 


Sampling 


Voucher 








longitude 


(meter) 


individuals 


date 


specimens 


P. ginseng 


TQL 


China 


43°36'129"N 
129°35'807"E 


469 


2 


9/2011 


NENU20 110902001 




WHL 


China 


43°30'181"N 
127°54'193"E 


551 


2 


9/2011 


NENU20 110903001 




FS 


China 


42°24'216"N 
127°12'186"E 


589 


2 


7/2011 


NENU20 110720001 




JY 


China 


42°23'197"N 
126°48'490"E 


612 


2 


7/2011 


NENU201 10713001 




XJD 


China 


42°20'870"N 
128°44'449"E 


845 


2 


9/2011 


NENU20 110902002 




CB 


China 


41°39'442"N 
127°35'229"E 


936 


2 


9/2011 


NENU20 110802001 




LJ 


China 


41°48'432"N 
126°55'530"E 


663 


2 


8/2011 


NENU20110801001 




BT 


China 


41°18'492"N 
125°49'954"E 


369 


2 


7/2011 


NENU20 110718006 




SZ 


China 


40°45'595"N 
125°20'863"E 


375 


2 


7/2011 


NENU20 110718001 




GL 


China 


41°25'121"N 
128°12'296"E 


765 


2 


9/2012 


NENU20130929001 


P. quinquefolium 


ws 


USA 


n.a. 


n.a. 


5 


9/2012 


n.a 




JY 


China 


42°23'197"N 
126°48'490"E 


612 


5 


7/2011 


NENU20 110713009 



n.a., no available data. 
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3.2. SCNG Library Construction and Primer Design 

To obtain SNPs from P. ginseng and P. quinquefolium, library of SCNGs was constructed based on 
the database of putative SCNGs (see in the Supplementary File 1). In detail, available references of 
Arabidopsis were retrieved from GenBank according to the accession number of Duarte et al. [34]. 
Then, all ESTs and genomic sequences of Panax were downloaded from GenBank and aligned against 
the constructed SCNGs library of Arabidopsis using Basic Local Alignment Search Tool (BLAST). For 
aligned EST sequences that satisfy minimum matched query length of 200 nucleotides and identify of 80% 
were considered as valid hits. To further identify the gene structures of SCNGs in P. ginseng, we blasted 
these ESTs against the BLASTX of GenBank (http://www.ncbi.nlm.nih.gov/dbEST/index.html) [35]. 
The exon-intron boundaries of SCNGs were determined by available annotated references. 
The identified Panax ESTs were subjected to design primers using the software Primer Premier 5.0 
(Premier Biosoft International, Palo Alto, CA, USA). 

3.3. PCR, Sequencing and Gene Function Prediction 

The designed primer pairs were further employed to amplify the target fragments of P. ginseng and 
P. quinquefolium. PCRs were performed using an ABI 2720 Thermocycler (Applied Biosystems, 
Foster City, CA, USA) in a 30 uL total volume containing: 20-50 ng template DNA, lx PCR buffer 
(Mg 2+ free), 2.5 mM Mg 2+ , 0.6 uM of each primer, 0.2 mM of each dNTP, 1 unit of rTaq polymerase 
(Takara, Dalian, Liaoning, China). The amplifications were performed under the following conditions: 
94 °C for 5 min, 35 cycles of 30 s at 94 °C, 30 s at the annealing temperature (Tables 1 and 2) for each 
designed specific primer, 90 s at 72 °C, and a final extension of 72 °C for 8 min. All amplified 
products were separated by electrophoresis on 1.5% agarose gels and purified with the Gel DNA 
Recovery Kits (Takara) following manufacturer's instructions and sequenced with the ABI3730 
sequencer (Beijing Invitrogen Biotechnology CO., Ltd., Beijing, China). Previous studies have 
documented that P. ginseng and P. quinquefolium are tetraploid species [36-38]. To ensure all the 
SNPs were retrieved from the orthologous genes, we have therefore sequenced more than 10 clones 
from the same individual for each putative SCNGs and analyzed the genomic divergence of the 
obtained sequences. To further determine the function of SCNGs, the obtained genomic sequences 
were searched against the GenBank non-redundant protein database of Arabidopsis thaliana using 
BLASTX [35] with an expected value <10~ 7 . The putative functions of these SCNGs are listed in 
Table 1. 

3.4. SNP Genotyping and Data Analyses 

Obtained DNA sequences of P. ginseng and P. quinquefolium were subsequently subjected to 
identify the SNPs. Initial sequence editing and assembly was performed using the ContigExpress 
(Informax Inc., North Bethesda, MD, USA, 2000). DNA sequence alignment was implemented in 
ClustalX 1.83 [39] and if necessary edited manually in BioEdit 7.0.1 [40]. To evaluate nucleotide 
diversity of the two Panax species, nucleotide polymorphisms were analyzed using DnaSP version 5 [41], 
including number of segregating sites (S), number of haplotypes (h) and haplotype diversity (Hd). In 
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addition, we also surveyed nucleotide diversity n [42] for total and nonsynonymous sites for each locus 
and the combined dataset separately. The insertions/deletions (indels) were not included in these analyses. 

4. Conclusions 

SNPs are increasingly being used as an ideal molecular marker in both model and non-model 
species. Here, we explored an approach of development SNPs from the SCNGs of non-model species 
P. ginseng and P. quinquefolium. Our results suggested that this strategy could also be applied to 
develop SNPs in other model or non-model species. 
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